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Abstract 


Markov and semi-Markov processes are increasingly being used in the 
modeling of complex rcconfigurable systems (fault-tolerant computers). The 
estimation of the reliability (or some measure of performance) of the system 
reduces to solving the process for its state probabilities. Such a model may 
exhibit numerous states and complicated transition distributions, contributing 
to an expensive and numerically delicate solution procedure. Thus, when a 
system exhibits a decomposition property, either structurally (autonomous 
subsystems), or bchaviorally (component failure versus reconfiguration), it is 
desirable to exploit this decomposition in the reliability calculation. In 
interesting cases there can be failure states which arise from non-failure 
states of the subsystems. We present equations which allow the computation 
of failure probabilities of the total (combined) model without requiring a 
complete solution of the combined model. This material is presented within 
the context of closed-form functional representation of probabilities as util- 
ized in the Symbolic Hierarchical Automated Reliability and Performance 
Evaluator (SHARPE) tool. The techniques adopted enable one to compute 
such probability functions for a much wider class of systems at a reduced 
computational cost. Several examples show how the method is used, espe- 
cially in enhancing the versatility of the SHARPE tool. 


Introduction 


Modem reliability modeling practice involves several techniques, including Fault-Tree 
analysis and (semi)-Markov processes. Fault-Trees enable one to evaluate the impact of cer- 
tain dependencies within the entire system. For instance, the fact that the failure of a certain 
component results in the failure of higher-level component (sub-system) may be modeled. 
Markov and semi-Markov processes ( chains ) can depict more general types of dependency. 
For example, consider a triplex system consisting of 3 identical components A lt A 2 , Aj, each 
with failure rate X. Then system failure can be represented within the domain of Fault-Trees 
by a ’2 out of 3 gate’ as in Figure Int-la, or by a set of AND and OR gates (Figure Int-lb). 
Failure may also be depicted by a Markov process (Figure Int-2). With the rates indicated, 
the probability of failure at time t is the probability that the process is in state F at t . 

Another “fault-tolerant architecture’’ is a triplex which operates by (instantly) detecting 
a first fault and then reconfiguring to a simplex system. This is accomplished by unplugging 
the defective component and a randomly selected “good” component. There is no way, 
using only the three components in a logic gate (Fault-Tree) arrangement, to represent the 
event of system failure. But it is easy to give the corresponding Maikov process (Figure 2b). 
Thus the necessity for Markov models tends to come about when system reconfiguration is a 
characteristic feature. 

For mission-critical systems found in process-control, avionics, and so on, one is con- 
cerned with the reliability at some particular time (mission time). Given a Markov process 
that models a system, the unreliability or failure probability can be found by using a numeri- 
cal differential equation solver [Reibman & Trivedi]. Alternatively, one may wish to have 
this quantity in closed form as a function of time. 

Large models (with many states and transitions) arise naturally in the study of complex 
reconfigurable systems. The solution of such a model can be both expensive and time- 
consuming. However, when the model can be decomposed hierarchically into smaller 
models, the process of solving the smaller models is generally less expensive than solving the 
large one. The desired reliability number associated with the large number can be computed 
from quantities derived in the solution of the smaller ones. 

The major purpose of this article is to increase our understanding of hierarchical model- 
ing and to increase our capabilities for solving such models. Two equations, presented as 
(4.7) and (4.8), govern the method of solution by decomposition. 

In order to write down these decomposition relationships we must present the 
Chapman-Kolmo gorov equations somewhat differently than in previous literature. However, 
[Ross] gives a related treatment These C-K equations involve quantities (probability density 
functions) which are used in a new integral equation, (4.7). The solution of the integral 
equation is a function that expresses part of the failure probability present in a “combined 
model” that results from two smaller models. The failure mode involved here (in the com- 
bined model) does not generally arise from the failure modes of the smaller models. Thus 
we have developed a technique for combining models accurately, even though there may be 
some interaction between them, leading to new failure modes. 

The obvious advantage of the closed-form framework is that once the solution is found 
(presumably at significant computational cost), the reliability is easily calculated for any 
desired value of time t. In addition, it is easier to find sensitivity functions (with respect to 
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failure rates or other parameters). The Symbolic Hierarchical Automated Reliability and Per- 
formance Evaluator (SHARPE) program takes this closed-form approach [Sahncr & 
Trivedil]. The reliability functions encountered in SHARPE are the so-called “exponomial” 
functions and can be easily represented “symbolically”. They consist of probability distribu- 
tion functions of a particular algebraic foim to be described shortly. 

The SHARPE modeling framework is amenable to the use of hierarchical modeling 
techniques. SHARPE was intended to promote hierarchical decomposition. Much of the 
information needed in applying our decomposition method can be obtained from SHARPE, 
either directly as output from the program or after a modest amount of additional computa- 
tion. The examples (in section 6) that illustrate the use of our method arc made much clearer 
by adhering to the closed-form (SHARPE) framework; the reader can observe functions aris- 
ing in the solution process explicitly. 

For the remainder of the Introduction we examine the SHARPE methodology, closed- 
form solution, and hierarchical modeling in greater detail. Section 1 is a review of the basic 
concepts of stochastic process theory needed for the finite-state semi-Markov processes that 
we deal with. Section 2 gives a form of the Chapman-Kolmogorov integral equations for a 
chain, in a manner that provides quantities necessary in the study of hierarchical decomposi- 
tion. Section 3 presents the basic facts of the SHARPE solution method applied to certain 
types of chain. This is not to be interpreted as a literal description of code (the author is not 
a developer of SHARPE), but as background useful in judging what the program’s capabili- 
ties are, and what enhancements might be desirable. 

Section 4 presents an introductory example which shows how simple models can be 
built up into larger ones, and how repair complicates the decomposition issue. The problem 
is stated, of how to compute probabilities in a “combined model” (which may not even be 
semi-Markov). An integral equation is given whose solution answers this question. Section 
5 is a digression on constructing certain exponomial distributions, and section 6 provides two 
examples that illustrate the power of our method. 

Exponomial Distributions 

To construct these distributions in an algebraic manner, we take as base field R, the real 
numbers (an idealization of computer floating-point numbers). The set of functions 
(«“, sinew, cosa/, t } where aeR, generate a ring of functions, which is extended by linear- 
ity over R to form an algebra Exp . The subset of this algebra consisting of distribution is 
called Dexp , the exponomial distributions. 

A function FeDexp has the properties that 

1) F( 0) = 0, 

2) lim F{t)= 1, 

f — > <* 

3 ) F(t) is non-decreasing for t £ 0. 

Condition 2) can be relaxed to lim F(t) < 1 for certain applications. In that case we say 

t — ► ** 

that F is defective, or incomplete, and has mass = 1 - lim F(t) at It is true that, given 

/ -4 oo 

a finite state Markov process (chain) M, its unreliability function is a complete exponomial 
distribution provided that a certain technical condition holds. The condition is that every 
absorbing state is a failure state, and that whenever a state is exited, there is a non-zero pro- 
bability that it will never be visited again. Hence the “operational” states arc transient: call 
this the “transient state condition”. If we weaken this condition to say merely that every 


3 


failure state is an absorbing state, then the unreliability fiinction is a defective exponomial 
distribution. This defines a larger class of functions, which we call Dexp*. 

The SHARPE program calculates such an unreliability function in closed form for any 
Markov process satisfying the “transient state” criterion. In addition, SHARPE can find the 
unreliability distribution of certain semi-Markov processes. The process must be acyclic and 
the transition functions should be in Dexp . In a semi-Markov process, transition rates from 
one state (the present state) to another (the receiving slate) arc not constant, but depend upon 
the time elapsed since the system entered the “present state”. We will later show (in section 
3) how to use SHARPE to obtain more information about the chain (Markov or semi-Markov 
process) than is simply provided by the output of the program. For example, given a Markov 
process with cycles, SHARPE docs not furnish the probability function (not a distribution) of 
a transient state. We show how to modify the chain, and then apply the C-K equations to 
find this. It is necessary only to solve one convolutional integral equation, not a system of 
equations, and to perform a few simple manipulations using output generated by SHARPE. 
The integral equation is generally easy to solve using the Laplace transform. This approach 
was first used by Lotka in the theory of industrial replacement, where similar renewal-type 
integral equations also arise. See [Lotka]. The rigorous foundations of this solution method 
were brought together in [Fellcr2], using analytic techniques developed in [Churchill]. 

Hierarchical Modeling 

As already indicated, hierarchical modeling methods are built into SHARPE. For 
instance, one way to construct a semi-Markov process is by means of state transition func- 
tions (distributions) given by the failure distribution of a (constant-rate) Markov chain. That 
is, in order to know explicitly a transition function of the semi-Markov chain, the “low- 
level” constant-rate chain must be solved. One process is in a sense embedded in the other. 
For examples of this type see [Sahner & Trivedi2]. A “full model” could be constructed by 
expanding the states in the higher-level semi-Markov chain into several states of a Markov 
chain. In many cases this capability of solving the higher- and lower-level models separately 
results in less computation and greater numerical robustness. 

Another common class of decomposable systems consists of the Cartesian products. 
Given My,M 2 Markov chains, then N = M jXAf 2 is Markov. Selecting a state 
A = (A,B)eN, AeM ,, BeM 2 , one may modify N by removing all exit transitions from A 
to form N. Thus A is now an absorbing state. Define P A to be the distribution function 
(possibly defective) corresponding to A. This function could be interpreted as “the probabil- 
ity that by time T we have simultaneously been in state A of M y and state B of M 2 ”. We 
give several examples of how this scenario can arise in practice. The method we present 
allows one to find P A without having to solve the large chain N. It is only necessary to 
obtain certain information about the smaller chains My and M 2 , which can be found for 
example by using SHARPE. Then another integral equation, (4.7), must be solved, leading 
to the desired probability function P A . This integral equation is similar to the Chapman- 
Kolmogorov forward equation (see [Fcllcrl] p. 458), and may be solved by the Lotka-Fcller 
method. 

This decomposition approach has several advantages. Work involved in solving an 
arbitrary chain with n states increases as the cube (n 3 ). In the Markov case, this is essen- 
tially the work involved in finding the eigenvalues of an nxn matrix. Hence, where decom- 
position is possible, much less work is needed to find the desired probabilities. Furthermore, 
the Cartesian product of even a Markov chain My with Si, a semi-Markov chain, need not be 
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semi-Markov. In general, it is a stochastic process where transition rates depend on the time 
elapsed since entry into the state previous to the present state. Provided that one docs not 
wish to deal with the theory of such general processes, the decomposition method is essen- 
tially the only viable approach. 

SHARPE is a powerful tool for obtaining exponomial solutions of reliability models. 
The techniques suggested in this paper greatly expand the computational horizons of 
SHARPE in directions consistent with the hierarchical modeling philosophy, and thus have 
more than theoretical value. But the techniques may also be used without recourse to 
SHARPE, and using them in the context of numerical methods instead of closed-form solu- 
tions is a promising possibility as well. 


1. Stochastic processes and distributions 

We give a brief review of stochastic processes, with emphasis on the ones that arc of 
greatest interest to us, namely Markov and semi-Markov processes (or chains). Although the 
term chain is sometimes used for discrete-time systems, we use the term to denote a 
continuous-time process that is either Markov or semi-Markov. The probabilistic definition 
of a Markov chain proceeds as follows. One starts with a space of outcomes, or sample 
space S. A continuous-time stochastic process is, for each />0, a function X (t). The 
domain of the function is the sample space, and each image X(t)o, where oeS, is an integer 
j from 1, • • • , N, identifying a state. So one may say that at time r, the process or outcome 
o results in state j . It is most common to classify outcomes into events, and to assign proba- 
bilities to the various events. For a finite sample space where all outcomes are equally prob- 
able, of course 


P(E) = 


# of_ q in E 

# of o in S’ 


and the expression (which is a slight abuse of notation) P [X(f) = j] is the probability of the 
event consisting of all outcomes o such that X(t)a = j. The quantity P[X(t) = j], called the 
state probability for state j . One can define a finite Markov chain to be a finite-state sto- 
chastic process X such that, for any set of times / 0 <f i< • • • <t n <t , the conditional probabil- 
ity that X(t) = x given that X(t n ) = x H , X(f„_,) = • • • , X(t 0 ) = x Q , where 

x, x 0 , • • , x H arc certain states, is equal to the conditional probability that X{t) = x given 

that X(t n ) = x„. This is the characteristic memoryless property. This finite Markov chain 
has the additional property of time-homogeneity if the quantity 


Pij (t) = P[X(u +t) = j\X(u) = i] 

depends only on t and not on u, for all m> 0, i, j such that i * j. As usual, P[E\F] means 

P\EnF 1 

the probability of E given F , or — j— — — . 

Now if one defines 




f=0> 


this quantity may be interpreted as follows. The increase over a short interval dt, of the 
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probability of being in state j, due to the original probability of being in i, is equal to 
P[X(f + dt) = j and X(f) = /]. From the definition of \y, this equals X iy P [X (f) = i\dt. 

By a suitable modification of what we have just done, we may obtain the definition and 
some properties of a semi-Markov chain. We use the idea of conditional probability density 
function, for which see [Trivcdi]. Say that the process X enters stale Xj at time f y if 
X(tj) = Xj, and if there is an e>0 such that X(f y — 5) * Jt), for all 0 < 5 £ e. Also, if 
X(0) = xq, the process X entered Xq at r=0 (jc 0 was the initial state). Given a set of limes 
t 0 <t t < • • • <t n <t , consider the conditional density function of X entering x at t given that 
X entered x n at t H , X entered at r„_,, • • • , X entered jc 0 at to- If this is always equal 
to the density of X entering x at t given that X entered x n at t„ , the process is said to be 
semi-Markov. We have for each pair i, j of states a density function 

(1.1) Gjj(t,u)= density of X entering j at u+t 

conditional on X entering i at u. 

The time-homogeneous case occurs when Gy is independent of «>0 for all i, j, t>0. 
We shall henceforth refer to a time-homogeneous Markov or semi-Markov process as a 
chain. Such a chain, since Markov implies semi-Markov, is characterized by the functions 


(1.2) Gy(f) = density of X entering j at t given that 

X entered i at time 0. 


It has been shown (sec [Ross], p. 89) that certain other sets of functions serve to 
characterize a chain. Consider the (possibly defective) distribution Fy = P[X enters j at 
some time t, 0<t <, t, and X docs not enter any state at any time K, Ockct I X entered i at 

0 ]. In words, Fy is the probability, conditional on entering i at 0, of ending the sojourn in 

1 by a jump to j before time r. In his 1964 study [Fcller3] of the C-K equations. Feller 
makes use of Fy. Another class of functions which determine a chain is referred to as the 
transition distributions Cy. Described in words, Cy (/ ) is the probability that X will jump to 
j (first entry) by time t , given that X entered i at 0 and assuming that Cy is the only transi- 
tion out of state i. Thus Cy is a distribution valid in the absence of competing transitions. 
The distributions [Cy J, over all j , are assumed to correspond to the independent events of 
jumping from i to the various states j. The functions [Cy } are what is supplied to SHARPE 
when it is desired to “solve” a chain (determine the time-dependent probability functions of 
its states). 

A relation between Fy and Cy will be given subsequently. For instance, when the 
chain is Markov, we have 


(1.3) 


Cy(t) =\-e X "', and 


X... -5>»' 

k 


The Cy functions can be given in various ways. For certain purposes, it is not necessary to 
define them completely but rather it is enough to give their mean and variance as distribu- 
tions. This approach is used by the SURE [Butler & White] package to find upper and lower 
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bounds on the reliability of a system whose reconfiguration times are not exponentially distri- 
buted. The SHARPE package, on the other hand, expects to be provided with C,- ; as a func- 
tion in the class Dexp . In other words, 


(1.4) 


_ . . ™ k. b.t 

Cij(t) = Z a r te , 

r=l 


where k^. is a non-negative integer and a r and b r are real or complex numbers. Qy should 
be a complete distribution function, in particular real-valued; from this it is not hard to show 
that the terms with non-real coefficients a r , can be matched in pairs with indices r, r', such 
that k r =k r >, a r = a^, b r - V. where the bar denotes complex conjugation. This property 
will be referred to as the “conjugacy condition”. A typical expression would be 

(1.5) l-e~‘+ -«-(!*>] 


or 1-e l —ae 'sinr. 

Having made the requisite definitions, we introduce standard terminology relating to the 
classification of chains in order to simplify later exposition. 

Definition 1.6 A chain is ergodic if, given that it is in state j at time t, if k is another state, 
then there is a later time t k such that P[X(t k ) = k] > 0. 

Definition 1.7 A state k is absorbing if, given X(t) = k, then P[X(t') = j] = 0, for all 
t' > t and j * k. Thus an absorbing state, once entered, can never be left. 

Similarly, a subset of the set of states could form an absorbing subchain if once entered, it is 
never left. Gearly, a chain with an absorbing subchain that is not the whole chain cannot be 
ergodic. 

Definition 1.8 A state k is transient if there is a state j * k such that given X(t) = k, then 
for some t’ > t, P[X{t') = ;] > 0, but given X(t) = j , then for all t'Z t, P[X(t 0 = k] = 0. 
Definition 1.9 A chain is irreducible if it has no absorbing subchain, other than itself. 

Consider an absorbing state A in a chain M . If M is not Markov, SHARPE requires 
that M be acyclic, and in any case M must have the property that all of its states are either 
absorbing or transient The distribution of time until A is reached, conditional upon A even- 
tually being reached, and denoted by P A , is provided as output by SHARPE. The output 
appears in symbolic “exponomial” form. A sample input and output format for a constant- 
rate chain is shown in Figure 1-1. 

bind 

lam .022 
mu .004 
end 

markov ty2 

1 2 2*mu 

2 3 mu+lam 
1 3 lam 
end 

1 1. 
end 
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cdf(ty2) 

end 


CDF for system ty2: 

1.0000c+00t( 0) exp( 0.0000c+00 I) 
+ -2.0000c+00 t( 0) cxp(-2.6000c-02 l) 
+ 1.0000c+00t( 0) exp(-3.0000c-02 t) 

mean: 4.3590c+01 
variance: 1.7949c+03 


Figure 1-1 


The class of exponomial distribution functions is a natural one for the study of chains. 
In fact, in the Markov case, the function P A as above is exponomial as can be seen from the 
differential theory of constant-rate chains. Let Q be the “infinitesimal generator” matrix, 
< 7 ;y = X;j for i *j, q it = - £A. ( y. Then if P(t) is a row vector of functions 

j * ' 

[Pi, • • • , Pj, • • • ,/*„], where the n states of the chain arc numbered 1, • • • , n and 
Pi(t) is the probability of being in state i P [X (f ) = i ] at time r, we have 


A1 P\t) = P{t)Q, P(0) = P 0 , 

[Rcibman & Trivcdi]. Here P 0 is the vector of initial 
probabilities. Then 


A2 


?«) = V. 


where we use the matrix exponential e Ql = Putting Q into Jordan normal form 

i=o 1 • 

Q = SJS~\ it follows from [Molcr & Van Loan, p. 381] that 


A3 P(t) = P 0 Se J, S~\ 

If 7= diag(/i, ••• , J p ), then 
A4 e jl = diag(e' /,# , • • • , e Jpt ). 

If 


Ji = 


Xi 10 0 0 
0 Xf r . 0 0 

. . . . 0 , 

.... 1 
0 0 . . Xi 
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and ffliXm,' complex matrix, then 
A5 

/' = 


t X,// 

e te 


0 e 


V 


0 0 


(W,-1)! 


te 


V 


.V 


From formulas A1-A5 it follows that any P t (t ) can be written 

( 1 . 10 ) , _ x . 

7=1 Jt=0 


where m is the number of distinct eigenvalues of Q, Xj is the y-th distinct eigenvalue of Q 
and pj is the multiplicity of the factor (x-Xj) in the minimum polynomial of Q. Since Pj(t) 
must be a real function, the conjugacy condition must hold, and we have an exponomial 
function. If i is an absorbing state of a Markov chain, P, must be a distribution function, 
but may be defective. If i is the only absorbing state, then P t is a complete distribution, 
under the assumption made above that all states are either absorbing or transient. 


2. Chapman-Kolmogorov Equations 


We present a form of the Chapman-Kolmogorov equations for a semi-Markov process 
which will be convenient for our purposes. These are a form of the backwards equations in 
that they give probabilities (and densities) by summing over all epochs (times of a jump), and 
all results of a jump out of a given state, for the first jump from that state. Similar equations 
are stated, with proof, in [Ross], p. 93. We deal with state probability functions, and their 
“densities”, which integrate to give the probability function. Thus a density does not have 
to be the derivative of a distribution. 

We recall some of our semi-Markov terminology from section 1. In this section, 
i, j, k arc states; then C ik (T) = probability that a jump from i to k would be made by time 
T , given that i was entered at time 0, in the absence of competing transitions. These are the 
transition distributions, and they arc assumed to be independent and competing. (They arc 
distributions of the independent events resulting in jumps to the different states.) This is the 
same as the definition from section 1 provided that C gives a complete distribution. Recall 
that the unconditional transition function F,^(T) is the possibly defective distribution of a 
jump from i to k by time T, given that i was entered at time 0. The two distributions arc 
related by 


( 2 . 1 ) 


Fik(T) = [C'i *(Ontl ~Cij(t)]dt. 

o j *k 


In words, the probability of leaving i for k by T is the integral of the density of jumping 
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from / to k at /, times the probability of not having jumped to any other state by t . 


Next let E ik (t) = density of (first) entry time from i to *, given that i was entered at 
0. This is the density corresponding to a possibly defective probability distribution. We 
have 

T 

E ik (T) = dF lk + £ [dFij(xyE jk (T^ C), i *k, 

( 2 . 2 ) j*kt> 

T 

E kk (T)= X \dF kj (x)E jk (T^t). 
j * * o 

In words, for the first equation, the density of first arrival in k is the density of jumping to k 
plus the density which results from jumping to a third state at a time x < T , followed by a 
subsequent first arrival in k at T. 


The density Gj*(0 defined below will be used in forming state probability functions, 
and is necessary to compute probabilities of “combined states” in hierarchical, or Cartesian 
product, models. It is defined to be the density of entering k at t, given that you entered i at 
0. This differs from E ik in that state k may previously have been visited (after leaving state 
i). Then we have 

T 

G^T) = £«(T) + f £«( t) GufT-Vdx 
(2.3) 6 

T 

G; k (T) = E, k {T) + |e,* (x) Ga(T-x)rfx 


The first expression is a Voltcrra integral equation of the second kind, for which [Burton] is a 
clear introductory source. It is the same type of equation discussed in Feller’s article on 
renewal theory [Feller2] and which was previously treated as part of the theory of industrial 
replacement in [Lotka]. The equation is also pivotal in later studies of population and 
economic growth, as more recent references from chapter 2 of [Kohlas] indicate. 

The verbal description of the equation and formula above arc now given. For the equa- 
tion, “given that you start in it, the density of entering k at T is the density of a first entry, 
plus the density of having entered it at a previous time x and subsequently entering at T , 
summed over all x”. For the formula, “starting in j, the density of arriving in k is the sum 
of the density of first arrival, plus that of a previous first arrival followed by a subsequent 
arrival from k to k”. 


These quantities can be used to express the state probabilities. That is, P ik (T) is the 
probability of being in it at T, given that you entered i at 0. We let S k = X F kj b® 1116 

j *k 

holding time distribution in state k . Then 

T 

F ik(T) - |G,*(x)[1 - S k (T -x)]dx, 

T 

P w (7) = l - S k (T) + Jg«(x)[1 - S k (T-^))dx. 


(2.4) 


For the first formula, a description in words reads: “the probability of being in k equals the 
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density of arriving in k, and subsequently not leaving k, integrated up to the present time”. 

An alternative formulation of state probabilities, using only the £, y functions can be 
derived from the above equations. In fact, 

T 

P*(T) = \E ik ( x) P w (7'-r)rfx, 

(2.5) 6 

T 

p 'u(T) = i - s k (T) + |e m ( t) p„(7 - x)dx. 

Again, the advantages to the approach indicated by the above formulas can be summar- 
ized: 


1) The method encompasses both Markov and 
semi-Markov chains, 

2) Efr, i * k can be found by SHARPE, 

3) after which only one integral equation need be solved 
to determine a probability function, 

4) then Feller’s method of solving the renewal equation can be 
applied: 

5) the approach is well adapted for hierarchical modeling 
as will be seen. 


3. The SHARPE Solution Method 
Acyclic chains ( Markov and semi-Markov) 

The method adopted by SHARPE in this case is equivalent to an analysis of paths from 
* ‘initial states” to absorbing states. An initial state can be defined as one that has a non- zero 
probability at time 0. That is, if i is the stale, then the vector P 0 has a positive i-th com- 
ponent. We discuss this by examining each path separately, as in a ‘‘depth-first search”, 
whereas SHARPE is actually programmed to compute probabilities at states as they are suc- 
cessively reached in a “breadth-first” search. The difference is one of form. 

For the system to traverse a particular path in the (acyclic, directed) graph representing 
the semi-Markov process is an event, which is disjoint from the other events corresponding to 
the other paths. Therefore, to get the distribution of an absorbing state, the traversal distribu- 
tions of all paths leading from some initial state must be added, weighted by their probabili- 
ties of occurrence. We must find a traversal distribution from a given path, and a probability. 
Suppose the initial state is i 0 , the final state is i m . The path of concern o can be written 
*o» *i* • • • , i m - We define recursively 

T 

(3.1) Bm cn = 1. S'(H = jF' ijti Jt)B^\T-t)dt, 

where j = 0, • • • , m— 1. 

Then define D 0 = B°(T). The derivative and the convolutions are performed “symbolically” 
by SHARPE, within the class of functions Exp . 
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Letpo = P« 0 ,YP,VY •• Pu-^ACO). Here, 


P ij, , j — 0, ' , ftl— 1. 


Then P k (T ) = JjPoD a (T), where a runs over all paths ending in k. 

a 


Cyclic chains (Markov) 

In the (cyclic) Markov ease, SHARPE uses both matrix analysis and estimation in the 
transform domain to find the distribution of an absorbing state. There is no reason why this 
method could not be used to give probability functions at transient states, but at present, 
SHARPE does not do this. Such information may be useful, however, as the example of a 
phased mission points up. Here the "mission” proceeds in two phases, the second com- 
mencing at time T j. The models for the two phases are the same, but certain failure and 
reconfiguration distributions have changed due to a maintenance action ([Baker], personal 
communication ). The initial state probabilities for the model of the second phase are given 
by the state probabilities of the first phase at time T x . We indicate how to use SHARPE to 
find this information, however. 

Recalling the infinitesimal generator matrix Q of section 1, it is clear from (1.10) that 
its eigenvalues and their multiplicities are of great importance in finding the probability func- 
tions. Any real matrix has a Schur decomposition 

(3.2) Q = UHU T , 

where U T denotes the transpose of U , U is orthogonal (UU T = /). The nxn matrix H is to 
have a nearly upper triangular form. That is, it is block upper triangular, with diagonal 
blocks either of size 1x1 or 2x2. In particular, H is an upper Hessenberg matrix: it is 
upper triangular except for possible non-zero entries on the diagonal i = j+ 1 (just below the 
main diagonal). Then the eigenvalues of H, and hence of Q arc the lxl real scalars and the 
complex-conjugate pairs arising from the 2x2 blocks. In order to take Q to this form, one 
may first find 

G = L k L k _ | • • • Lq-Q Lq ■ • • L k , 

where G is in upper Hessenberg form, and Lj is a "Householder matrix”. A Householder 
matrix represents a rcllection through an (n-l)-dimcnsional hypciplanc orthogonal to a cer- 
tain vector v-. The details of this algorithm can be found in [Golub & Van Loan] p. 222. It 
is due to Wilkinson who exposited it in his book [Wilkinson]. Now G has a Q-R decompo- 
sition G = WR , where W is orthogonal (essentially a product of rotations) and R is upper 
triangular. Another algorithm implicitly finds G' = RW. Using G’ as the new Hessenberg 
matrix G , we repeat this process until the real Schur form is attained. Then the eigenvalues 
(which have not changed through any of these transformations) may be read off. 

Next, SHARPE must determine the coefficients a ijk of formula (1.10). By transforming 
the differential equation in formula A1 of section 1, one obtains 

sP(s)-P 0 = P(s)Q, 

or P (s i - Q) = P 0 . Each /*,($) is of the form 
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v %' fo - 

k k % (s + Xj) k ' 

determining the (3,y* is equivalent to finding a^. But for a particular choice of s, say Cl* we 
get P(Qr = Pq, T — (Ci/ - Q). Thus we have n equations for the n 2 unknowns {P,y* )• 
Similarly, setting’s = C* • • • . C«. for suitably chosen values, will give enough equations to 
determine the coefficients we seek. 

We now indicate how to use the SHARPE approach to determine transient state proba- 
bilities. This will work for any Markov chain; if general (semi-Markov) transitions are 
allowed, the technique is only good for states (if any) that satisfy the following. “If all tran- 
sitions out of the state of interest are removed, that which remains is a chain that is 

1) pure Maikov, possibly with cycles, or 

2) acyclic semi-Markov.” 

Call this condition Condition Q . It is rather remarkable that SHARPE, with some additional 
calculation, can treat certain semi-Markov chains with cycles. The computational techniques 
involved are amply illustrated by the examples at the end of the paper. Now the method is 
described in general terms. 

Given a (non- absorbing) state r, we are interested in P r (T) as a function. If there is a 
single initial state j, Pj(0) = 1, this is the same as Pjr(T). But using SHARPE, for any state 
/ * r , one may find £ ir (T). This is done by describing the chain to SHARPE, giving the 
transition rates and distributions as usual, but omitting any transitions out of r . This makes 
r into an absorbing state r\ We assign in the input to SHARPE, P ,(0) = 1, and all other 
initial state probabilities zero. Given that Condition Q holds, SHARPE can find the cumula- 
tive distribution function H r <{t ) of arrival into r', as well as the overall probability p r < of 
reaching r\ Consider 

L ir <T) = H r <T) p r >. 

This is the unconditional distribution of entering r’. The derivative of with respect to 
time is just E^T), since arrival and first arrival are identical for an absorbing state. Thus 
SHARPE has found the functions E ir , i * r . These are then used by means of the second 
part of (2.2) to find E n {T). The second part of (2.5) is an integral equation for the unknown 
function P rr . Once this has been solved, we need only perform the convolution integration 
of (2.5), first equation, to obtain P r which was the desired state probability function. 


4. Decomposition Methods 

Given a complex failure-repair-reconfiguration system, it is tempting to decompose it 
into independent subsystems, or ones that are nearly independent. Independence allows one 
to compute the probability of being in a given state (for each subsystem) by using the pro- 
duct formula. Since the computational cost of analyzing a system model increases geometri- 
cally with size, significant savings can be had if the system is decomposable in this manner. 
As an example, consider two triplex systems attached to a voter as shown in Figure 4-1. are 
both required to be functioning for system viability. In each subsystem, S b , the failure of 
two components causes a triplex failure. If the component failure probabilities are p a and 
p b , we have 
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( 4 - 1 ) Ps = Ps. + Ps„ - Ps . Ps „ . 

where p Sa = 3p a \l-p a ) + p?, 

Ps> = 3PbO~Pb) +Pb- 

Thus wc see that in forming system failure probability, we certainly do not need to consider 
separately all failure modes, such as "one unit in S a has failed, together with 2 units in S b ”. 

On the other hand, in Figure 4-2. the same failure conditions apply for S a and S b , but 
their “failures” are not independent events. We say that unit fl, is “isolated” when a, has 
failed (the voter has no access to it). To be isolated is as bad as failed. Thus when a, and 
b 2 arc failed, the system has failed, since the voter can see neither b\ nor b 2 . The “failure 
conditions” need to be explicitly analyzed. If we take the simplest Markov chain representa- 
tion of S a and S b , wc get Figure 4-3. At a given time t we again have 

(4-3) P St {t) = 3P a \t) - 2 P a \t) 

for the failure probability. But it is not clear how to obtain P s (t ) for the combined system. 
Figure 4-4a gives an “equivalent” Markov chain; its failure probability function is the same 
as for Figure 4-3. The corresponding chain for subsystem S b is shown in Figure 4-4b. In 
the “combined” model (not shown), certainly when one of the subsystems is in a failed 
state, the system is failed. Thus ( F a ,*) and (* , F b ) give system failure, where * is any 
non-failed state of the appropriate subsystem. But also for example (Oil, ioi) is a failed 
state. Due to independence of unit failures, this state’s probability is / > 011 / , (OI - = P a (t) P h (t). 
Examination gives 6 of these states so we finally get 

(4.4) P s (t) = P s JO (l - P St ( 0) + (1 - Ps m (0>Ps t + 6P a (t)P h U). 


Therefore, the combined model is the Cartesian product S a xS b with certain transitions 
modified. For example, the definition of Cartesian product implies that (101, ioi) is a state 
with a transition of rate 2\ a to (F a , ioi) and a transition of rate 2\ b to (101, F b ). Next, all 
“combined state” satisfying the failure condition are made absorbing. Thus the transitions of 
rate 2X a from (110, ioi) to (F a , ioi) and rate 2\ b from (110, ioi) to (110, F b ) are deleted. 
This reliability problem, of coupled nodes and sensors, can be solved in two ways: firstly by 
forming and solving the combined Markov model in the manner we have just indicated, and 
secondly by solving each of the two models S a and S b , not only for their failure probability 
distributions, but also for their state probability functions. These functions are then combined 
in some way, similar to (4.4), to give the distribution of the entire system S. 

The situation becomes more involved when the components admit of repair. The Mar- 
kov model for system S a is then shown in Figure 4-5, and the model for S b is similar. At 
time T, if wc are in a failed state of S a or of S b , the system S has certainly failed. If we are 
in a state such as 01 1 in S a and ioi in S b , the system is failed as well. But we may well be 
in an “up” state, such as 111 in S a and iii in S b and still have to consider that we are 
failed. This is because at some previous time t<T, we may have been in 011 and ioi simul- 
taneously which would have brought down the system. 

The combined model, called m, accurately reflects this stale of affairs: the state 
(Oil, ioi) has been made absorbing, so the transitions (named after their numerical rate): 


(4.6) 


14 


\i a : (Oil, ioi ) — > (111, ioi ) 
|i fc : (Oil, to/) — » (111, iot) 


do not exist. 

The problem is how to compute the failure contribution of combined states such as 
(Oil, ioi) without solving the combined model m. This is analogous to the non-repair situa- 
tion, with the difference that we cannot simply use the expression Pon(T)P ioi (T) as we did 
there. The expression we seek could be expressed in words as “the probability that at some 
time prior to T, the S a state was 01 1 and the S b state was ioi ”. 

In the semi-Markov case it is not feasible to find these "combined state” probabilities 
by using a Cartesian product model. If M and N are semi-Maikov chains, the Cartesian pro- 
duct MxN will generally not have the semi-Markov property. For example, Figure 4-6 dep- 
icts a two state Markov chain and a two state semi-Markov chain with hypoexponential dis- 
tribution C(f) = 1 - 2 e~' + e~ 2 ' . The combined model (Cartesian product) is then shown 
with distributions indicated. Since C(t) is not exponential and hence not memoryless, the 
density function of the transition (b pc) (b ,y) depends not only on the “local” time spent 
in state (bpc), but also on the entry time into (bpc), or the time spent in (ape). This violates 
the semi-Markov property. 

For this reason we do not work explicitly with the “combined model”. But we still 
consider ordered pairs of states, and say, informally, “the system is in state 04, X), where A 
is a state of M, and X is a state of N”. We a consider failure condition given by a pair 
(B, Y), BeM, YeN. Let Z ABXY (T) = the probability that M has entered state B while N 
was in state Y, or N entered state Y while M was in state B, at a time t, 0<t<T , given that 
M was in A at 0, and N was in X at 0. It should be helpful to look ahead to Figure 6-1 
which gives a good illustration of this situation. 

In case A * B or X * y, one can also interpret Z AB X r(T) as follows: the probability, 
given that M started in A and N started in X, that M has been in B simultaneous with N 
being in Y . The quantities arc determined by means of two fundamental equations. The first 
is: 


Z BB YY CO = j(pBB (X) Fjry (x) + P BB (x)'Gyy (x))t 1 ~ Z BB% yy (T -X )]dX. 

In words, the right-hand expression is the integral over t of the density of entering into the 
“state” (B , T), and subsequently never arriving again (to avoid counting arrivals twice). 
This is similar to a Chapman-Kolmogorov forward equation in that we integrate over densi- 
ties of the last jump into (B ,Y). The quantities G BB , P BB , Gyy, Pyy arc found as in section 
2 from the separate models M and N. Then (4.7) is an integral equation to be solved. Note 
that if we use a Laplace transform method, the expression G BB (x)-Pyy(x ) must be multiplied 
in the time domain, and then transformed, or else G BB (s) and Pyy(s) convolved before 
proceeding further. This is illustrated in the subsequent examples. Given Z BB >JT , one can 
find Z m X y by integrations: 


(4.8) 


T 


^ab ,xy(T) — {(G^b (x) P xj-(x) + P /\fl(x) Gxr(x))[l — Z BB yy(T x)]dx. 


The verbal interpretation of the right-hand expression is left to the reader. 
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5. Distributions from Mean and Variance 


Modem fault-tolerant computers, as used in high-reliability applications such as 
aerospace and nuclear plant control, employ architectural features beyond simple majority 
voting of independent processors. Instead, faulty components may be switched off, and 
spares activated; the system is changed upon detection of a fault. A simple system with such 
dynamic reconfiguration is shown in Figure 5-1. This depicts the triplex degradable to a sim- 
plex mentioned in section 1. Practice has generally borne out the constant failure rate 
assumption for electronic components during their active life span. But the ‘'reconfiguration 
distribution” ©(f) has been observed not to be exponential, as in [Finclli]. This transition 
includes the time necessary for the system to detect the presence of single fault, isolate the 
two components (one good and one bad), and remove them from service. 

It has been shown in [Butler & White] that giving the mean M and variance V of ©(f) 
is sufficient to determine P$(T) to within a few percent, assuming that M is much smaller 
than the reciprocal of the largest failure rate in the system S. Here, T is the mission time. 
That is, the system is assumed to fail slowly and reconfigure quickly. 

To check such reliability results, obtained by the SURE program package, one might 
use SHARPE on the same example. To do so would necessitate presenting co (f) in expono- 
mial form. Our goal in the present section is simply to give a way of determining ©(f) 
explicitly, knowing that it is a distribution with mean M and variance V . We utilize the 
method of Cox from his classic paper [Cox]. Three cases exhaust the possibilities. 

Case 1. Suppose M 2 = V. Then take o>(0 = 1 - where X = 1/A/. 

Case 2. If M 2 > V , let k = [M 2 IV] be the greatest integer less than M 2 IV. Consider the 
linear chain in Figure 5-2, consisting of k stages with rate kX and a final stage of rate y. 
Since the random variable “time to failure” is the sum of the independent transition times of 
the stages, the mean and variance are additive (respect the summation). See [Trivedi, p. 
192]. Thus 


(5.1) 


M 





V =k- 


2 \ 2 


k l X 





From this one obtains the formula 


(5.2) 


X 


M 


^M 2 -(1+j)(M 2 - V ) 

(l+~) 

k 


The practical way to get ©(f) in closed form is to find k,X,y and enter a SHARPE file for 
the linear chain. SHARPE will then find the desired distribution ©(f). Figure 5-3 shows the 
input and output formats. In a hierarchical fashion, SHARPE allows the cumulative distribu- 
tion function of this chain to be used in a "higher” system, eliminating the need ever to 
write the exponomial form of ©(f) explicitly. 


bind 

lambda .4167 
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gamma .16667 
end 

markov linear 

0 1 4*lambda 

1 2 4*lambda 

2 3 4*lambda 

3 4 4*lambda 

4 5 gamma 
end 

0 1 . 

end 

cdf(l incar) 
end 


CDF for system linear: 

8.5749C-02 t( 3) exp(-1.6668c+00 t) 
+ 3.2582e-01 t( 2) exp(-1.6668c+00 t) 
+ 6.1957C-01 t( 1) exp(-1.6668e+00 t) 
+ 1.0000c+00t( 0) exp( O.OOOOc+OO t) 
+ -1.5241c+00 t( 0) exp(-1.6667e-01 t) 
+ 5.2412e-01 t( 0) exp(-1.6668e+00 t) 

mean: 8.3997e+00 
variance: 3.7438e+01 


Figure 5-3 

A complete explanation of SHARPE input and output formats should be found in [Sahncr & 
Trivcdil]. Certain symbolic variable names such as "lambda” arc bound to a numerical 
value, the system is described by type (markov) and given a name (linear). The states and 
transitions, with rates, arc given in the following lines; after an "end”, state 0 is assigned 
initial probability 1. Then the cumulative distribution function (cdf) is requested of 
SHARPE. The cdf then appears in the output, using mantissa and exponent notation to 
describe floating point numbers, and "exp” to denote the exponential function. Hence the 
meaning of the first line of the output is 8.5749xl0 -2 f 3 Finally the mean and 

variance of the cdf arc given. 

Case 3. If M 2 < V, we take co(f) to be hyperexponential with distribution 

(5.3) (0(f) = 1 - pe'V 1 - qe 

where p+q= 1. According to [Trivedi, p. 212], 

M =2- + -f, 

p K 

v = + 2jl _ m 2 . 

p 2 A 2 

An effective iterative procedure to find p , q , p and X is to set 
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p = q = -5, 

\= 1.1/M, 

H = 1/(2 M - j). 

Then let (Step 1) X = (M 2 +V)/2 - q/X 2 . This should approximate p/p. 2 . Then set a new p 
value (Step 2) equal to ( M-qlX)IX . Multiplying out the following expression shows (Step 4) 
that pX (M-l/X)/(A,-p) should equal p, so we take this value as our new p. Finally, (Step 
5), take q = 1-p, and begin again at (Step 1), repeating until the computed mean M and 
variance V are as close to the given values as needed. This method is used in the next sec- 
tion to construct a distribution. 


6. Examples 

As our first example we consider the two models / and II depicted in Figure 6-1. 
Model / is a transient-fault detection mechanism. In state A the mechanism is functioning 
normally; in state B transient faults arc incorrectly diagnosed as being permanent. See [Lala] 
p. 20. In state C , a rare kind of error causes spurious signals to be sent external parts of the 
system, causing an overall crash. 

Model II represents the arrival of, and recovery from, transient faults over the entire 
system. State X represents the active presence of a fault, and Y the disappearance (absence), 
of faults. A similar model could be used to depict the error-producing and benign phases of 
a single “intermittent” fault. Several other reliability estimation packages besides SHARPE 
provide a capability for modeling the arrival and detection of permanent, transient, and inter- 
mittent faults to the system. See [Trivedi ct al] and [Bavuso & Peterson]. 

We wish to consider the system as being up when the two “subsystems” I and II are 
in states (A X), (A ,T), or (B ,Y) respectively. Whenever model I is in state C, the system is 
down, but also whenever I is in slate B at the same time as model II is in state X , we must 
consider the system to have crashed, since a transient fault is present but is incorrectly diag- 
nosed (as permanent). 

The “combined model”, with states {1, ••• ,5} is shown in Figure 6-2. The 
correspondence between the states of the combined model and the Cartesian product IxII is 
indicated. The system failure states arc absorbing. Note that there is no state corresponding 
to (C X)- it is superfluous. Thus, to find the failure probability at time T, one may lake 

(6.1) P4J) + P S (T). 

We are most interested in Pj(T), the probability of failure due to being in “state” 
( BX ) at some time t < T. For simplicity, suppose that the system starts life with model / 
in state A and model II in state X. In the notation of section 4, we see that 

(62) P 4 (r) = Z ab ^x(T). 

The total failure probability can also be obtained from the solution of the individual models I 
and II. The remaining part to be considered is for state C to be entered while model II is in 
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state Y . Since C is absorbing, we know that the density of entering C in model I is 
Gac = E AC * and we obtain 

(6.3) \ 

jE AC (x)P r (x)dx. 

Then adding expressions (6.2) and (6.3) gives the total failure probability, and should be 
equal to the distribution obtained from considering the absorbing states 3 and 5 of the com- 
bined model. 

In finding Z ABiX x(T ) as a closed-form exponomial function, we will need to know G B b< 
P m , G An , P AB , Gxx< and F xx , as indicated by cquatioas (4.7) and (4.8). We set coefficients 
in system / as 

a = .3, 

P = .5, 

X=A. 

We have dF BA (r) = .5e~ u , and since there is only one transition into B , it also follows that 

E ab = dF AB = .3<T 3 '. 

Thus by (2.2), 

(6.4) 

Transforming according to the construction in [Feller2] gives 

r " (l)=: (J+.3HJ+.6)- 

By (2.3), we know that 


E bb (T) = \dF BA { x) E ^ (T - x)dx. 


1 - E BB 


.15 

s 2 + .9s + .03 


Applying (2.3) also yields 


G BB (s ) = Eab + E^ ■ G BB 


.3 s ± .18 
s 2 + .9s + .03 


Next, note that S B is the “reliability” function of state B, given that the state was B at 
T = 0. Thus S B (t) = e -6 ', so by (2.4), we obtain 


Eab ( s ) = 


J+.3 

j 2 +.9s+.03 


From model If, we require G X x and Pxx- We take r= 2 and s= 3 which are of course 
intended to have didactic value if not realism. In a manner similar to that made in the com- 
putation for model / is obtained 
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(6.5) 


Exx(s) = 


6 

(s+3)(j+2) 


Gxx ( s ) - 


6 

s 2 + 5.v 


Then we have from (2.5), 


Pxx(s) = 


1 

s+2 


1 + 


6 

j 2 +5j 


j+3 
s 2 + 5s 


Now define H B {t) = G BB (t)P xx {t) + P BB (t)Gxx{t). _We are in fact interested in 
H b (s). To find this one must invert the transforms G BB (s), P^is) and so on, perform mul- 
tiplication and addition in the time domain, and then re-transform. A numerical mathematics 
package is helpful here. By this means one obtains 

L^s 3 -' 

H b (s) = — , where 

S>i s5 ~‘ 

: __ t 


i?=[ 6.1500 34.6350 13.0995 ] 

v>=[ 1.0000 11.8000 39.3700 26.9040 0.8859]. 

By (4.7) we have 

Z BB xx( s ) = H b (s ) •[— - Z BBwXX ]. 
s 

Writing Z BBXX in rational form as Z'^is )/z|°'(i ) yields 

(6.5) Z' B °r = HP, Zf = * <K P + H b B °‘), 

6 

giving Z B = Eh',* 6 ’*. 
i=i 

Herein =[ 1.0000 11.8000 45.5200 61.5390 13.9854 0.00 ]. 

Next we require H A (t) = G AB (t )/ > xx (t) + P AB (t )Gxx (r ). Setting H A (s) = H^m b A °\ we 
find that 

4 5 

(66) 7/^= H a = where 

i=t i=l 

i?=[ 0.3000 2.8500 10.7010 13.8294 ] 

?=[ 1.0000 11.8000 39.3700 26.9040 0.8859 ]. 

Next we apply (4.8) and obtain using a similar notation 


(6.7) 
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Z , A °P(s) = HjC P (Zl ot -sZ^ 

Z^(s) = s H^ Z^. 

The numerator of Z A has degree 7 in s , and the denominator has degree 9, but they have 5 
roots in common. When the corresponding factors have been canceled, what remains is 

z abjcx(s) = ~ 5 . where 

EM 5 " 

i'=l 


if=[ 1.0000 6.9000 17.7300 ] 

?=[ 1.0000 9.2000 21.6000 5.3790 0.0000 ]. 


Now Z Af} jx ( s ) has distinct poles, and its partial fraction expansion corresponds to the expli- 
cit exponomial form of Z^^ (r). Writing simply Z and Z, we have 


Z(j)= Z 




,•« i J+P/ 


where we write and fl in column form 


0.98884551031790 

-0.05848988319612 

0.08378848442687 

-1.01414411154865 


0 

-5.35172855471732 

-3.56645190997164 

-0.28181953531104 


Then of course Z(t) = £ a i e p, ‘ . 

;= l 

Consider the SHARPE input file for the combined model, together with the output 
information about node 4 in Figure 6-3. The distribution given is conditional upon entering 
the absorbing state 4. When multiplied by the given entrance probability, this gives the 
unconditional distribution, which is seen to agree with Z ABJCX (t) to 9 digits of accuracy. 

bind 
alph .3 
bet .5 
lam .1 
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r 2. 
s 3. 
end 

markov death 
1 4 alph 

1 2 r 

2 1 s 

2 3 alph 

3 2 bet 
3 4 s 

3 5 lam 
end 
1 1 . 
end 

cdf(dcath,4) 

end 


information about system death node 4 

probability of entering node: 9.888455 10c-01 

conditional CDF for time of reaching this absorbing state 

l.OOOOOOOOc+OO t( 0) exp( 0.00000000c+00 t) 

+ -1.02558398e+00 t( 0) cxp(-2.81819535e-01 t) 

+ 8.47336450C-02 t( 0) exp(-3.56645191e+00 t) 

+ -5.91496676e-02 t( 0) exp(-5.35172855e+00 t) 

mean: 3.62644539e+00 
variance: 1.2665 8 132e+01 

Figure 6-3 

In the second example we depict several physical components and their failure modes 
hierarchically. New features which were not present in the first example include 

1) determination of a simple exponential form of a distribution given its mean and variance, 

2) semi-Markov transitions, 

3) double poles in certain transition transforms, 

4) trigonometric solutions, 

5) neither coincident state is an initial state. 

The example is a simplification of one aspect of the Integrated Airframc/Propulsion Control 
System Architecture (I APS A). Sec [Cohen et al], p. 71. The nodes (sensor-processor pairs) 
form a reconfigurable duplex. The failure rate of each component is $ = .003, the resulting 
model is shown in Figure 64. Here C and E are failure states, but as in the previous exam- 
ple we are concerned with failure modes arising from coincident conditions on separate struc- 
tural levels. The transition function c(r) represents the distribution of system reconfiguration 
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lime. It is the distribution of the random variable which is the sum of the times taken by the 
duplex operating system to detect an error, isolate the faulty unit, and configure to a simplex 
system. Experimentation with faults injected into the system has yielded a mean time of 
.01 see with a variance of .001 see 2 . According to section 5, a hypcr-cxponomial distribution 
can be used for c(t). A SHARPE model, and output realizing this arc given in Figure 6-5, 
model "rcconfig”. 


bind 
p .025 
mu 7. 
lam 150 
end 

markov rcconfig 

1 3 mu 

2 3 lam 
end 

1 P 

2 l.-p 
end 

cdf(rcconfig) 

end 


CDF for system rcconfig: 


l.OOOOOOOOc+OO t( 0) exp( O.OOOOOOOOc+OO 
+ -2.50000000e-02 t( 0) exp(-7.00000000c+00 
+ -9.75000000C-01 t( 0) exp(-1.50000000c+02 


0 

t) 

0 


mean: 1.00714286e-02 
variance: 1.005641 16c-03 


Figure 6-5 


The other hierarchical component of the system is a dual partition network to which the 
nodes arc attached. For simplicity we assume that either of two states can hold: both parti- 
tions are functioning, or else one partition is functioning and the other is undergoing repair 
(by configuring in a spare communication link). The '‘degraded” network is fully functional 
when the "node” system / is in either a stable duplex or simplex mode. However, the 
overall system cannot tolerate a simultaneous partition repair and duplex-to-simplex 
reconfiguration. The two-state model in Figure 6-6 illustrates the communication network, 
model//. 

The partition failure rate is taken as a constant a = .01; due to a rather complete under- 
standing of the link repair mechanism, the repair distribution b(t) is precisely known and is 
shown in Figure 6-7 (model net-repair). As indicated in the SHARPE output, the mean and 
variance of repair are roughly .02sec. and .0003 sec 2 respectively. Note the factor of t in 
one of the terms of b(t). 
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bind 

p .02 
mu 16. 
lam 100. 
end 

markov net-repair 
1 2 mu 

3 4 lam 

4 5 lam 
end 
IP 

3 l.-p 
end 

cdf(net-repair) 

end 


CDF for system net-repair: 


+ 1.00000000e+00 t( 0) exp( 0.00000000e+00 t) 
-9.80000000e+01 t( 1) exp(-1.00000000e+02 t) 
+ -9.80000000e-01 t( 0) exp(-1.00000000e+02 t) 
+ -2.00000000e-02 t( 0) exp(-1.60000000e+01 t) 


mean: 2.08500000e-02 
variance: 3.09527500e-04 


Figure 6-7 


In the notation of the last example we are concerned with the function Z^^yiT). This 
is the probability given that model / begins (at t= 0) in A and model II begins in X, that 
before the time t — T model I has been in B simultaneous with model II being in state Y. 
to this end one must find, for model I, the quantities G BB , P BB , G/&, and Pab . For model 
II , one seeks Gyy , Pyy * and P \y> 

Since B is not a recurrent state, we immediately obtain G BB = 0 and thus 
P BB (J) = 1 - S b (T) from (2.4), second equation. A calculation of the distributions F BC and 
F bd yields 

S B = F bc + F bd = 1 - — qe~^** . 

Since G^ = E M = dF AB = , we also have 

T 

P ab(T) = - S (T -x)]dx. 

We could also obtain P^ directly from the SHARPE model in Figure 6-8. The quantity P M 
is obviously e~^‘ , and the SHARPE output gives the total failure distribution, that is 
P AC + Pad » s0 P ab is 1 minus the sum of these two quantities. 



24 


bind 
phi .003 
pC .025 
qC .975 
mu 7. 
lam 150. 
end 

semimark nodes 

1 2 cxp(2*phi) 

2 3 cxp(phi) 

2 4 genN 
1,0, ON 
-pC,0,-muN 
-qC, 0,-lam 
end 

1 1 . 

end 

cdf(nodcs) 

end 

CDF for system nodes: 

1.00000000c+00t( 0) exp( 0.00000000c+00 t) 
+ -1.00006044e+00 t( 0) cxp(-6.00000000e-03 t) 
+ 2.14377590C-05 t( 0) exp(-7.00300000c+00 t) 
+ 3.90007800c-05 t( 0) cxp(-1.50003000e+02 t) 

Figure 6-8 


Note how in the model, the semi-Markov transition is entered as a 
general distribution. 

Converting to the s -domain, one has G^ (j ) = 2(j)/(.r +2<J>) and 

P, D ( r) — TO. 2 + — 2 

} ^+2<> s+rjH-X. 

Also, Gyy — Eyy 4" Eyy 'Gyy from (2.3). In fact Gjy(i) = GyPIGyy where 


Gif = 1 0 3 x(0.00000320j 2 + 0.09864000s + 1.6000) 

Gyy = 10 5 x(0.00001000s 4 + 0.002 160 1000.9 3 + 0.13202156800s 2 + 1.60033360000.9 + 0). 


T 

Next, from (2.3), second equation, we have G X y(T) = E XY (T ) + jE xr (i)G Y Y (T so 


G xy (s) = 


.01 

.9 +.01 
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Next, from (2.4), letting L, denote Laplace transform 
P X y(s) = Gxy(s)L s [ 1-MOl = Gxy(x)' 


100 V . V , 1-V 

+ *r 


(j+100) 2 S+IOO s+\6 
Here V = .98 as indicated in Figure 6-7 (model repair-net) 

We begin computing the quantities that govern the coincident states. Firstly, since 

Gbb( 0 = 0 we have from (4.7) 

r 

z bb,yy(T) = - 2 bb ,yy ( OU *. 

where H BY (t) = P B b(0^yy(0- Solving yields 


Z BB ,YY ~ 


s(H$ + Hof) 


After some simplification, one arrives at 

_ 10 v. 

z bbM s ) = 


Y= 10" 5 x 


t = 


4.1360951 

2.6757974 - 1220.9445298i 
2.6757974 + 1220.9445298i 
0.1175371 
-6.5008554 

0.2214496 - 1 1.705 1909i 
0.2214496 + 11.7051909i 
0.0217117 
-3.5689825 


0 

-2.500031 + 0.001 565i 
-2.500031 - 0.001565i 
-1.660038 
-1.500127 

-1.070078 + 0.009775i 
-1.070078 - 0.009775i 
-0.230031 
-0.070032 


Manipulation of (4.8) yields the formulas 

zav = - * 37 > 

hot i j bot 7 hot 

Z ABJCY - h ax '^by- 

Here H m ( t) = G AB (t)P XY (t) + P^OXwO)- We present explicitly; its value when 
j=0 is of interest in that it represents the long-term or steady-state arrival density. Since in 
practice Z BBtYY (t) 5s very small, formula (4.8) shows that the long-term probability of ending 
up in our coincident failure state (B ,F), instead of one of the other failure states, should be 
very close to this number which is 3.09X10" 4 . Approximating our scmi-Maricov models by 
constant rate models gives an estimate of 2.99x1 CT 4 for this probability. We do not give 
Z AB xy(t) explicitly, since there ate many terms. There is a strong temptation to simplify 
Z (j ) by canceling roots in numerator and denominator which seem equal or very close, but 
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this is a numerically delicate procedure. Instead we give //^(r) explicitly from the partial 
fraction expansion. 

— _ L io 

Hax(s) = h$/h$ = x<yCf+p,.) 

J m t 


We obtain 


<t= 10^x 

-0.000038 + 0.001930i 
-0.000038 - 0.001930i 
-0.389927 
-0.214333 

-0.596018 + 29.687 109i 
-0.596018 - 29.687 109i 
-0.074953 
1.854995 
0.016351 


-2.500079 + 0.009899i 
-2.500079 - 0.009899i 
-1.500030 
-0.070030 

-1.000109 + 0.009899i 
-1.000109 - 0.009899i 
-0.160062 
-0.000060 
-1.000063 


Then writing 

8 . . . to 

i=i i=l 

we get 

?= <t = 

0 l.OOOOc+OO 

1.2000C-04 9.7306e+02 

1.0796e-01 3.8452c+05 

3.8301 c+01 7.9588c+07 

6.7934c+03 9.2674c+09 

6.2567C+05 6.0156c+U 

2.7433e+07 1.9872c+13 

4.290 le+08 2.6289c+14 

1.9499C+09 1.0529C+15 


Letting Oj = dj + eji, p ; = Uj + vji, where i = V(-l), we get 

(7.7) // M (0 = o 3 e p3 ' + a 6 e Pe> + o 7 e p7 ' + <5 % e w + o 9 e p *' + 

2e“ l< (d 1 cosv 1 f + e,sinvjf) + 2e“ 4, (rf 4 cosv 4 r + e 4 sinv 4 r). 
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7. Conclusions 

An important recent approach in reliability (and performance) theory is found in the use 
of closed-form, analytical solutions. One advantage is that this approach lends itself very 
well to models which are built up of smaller submodels in a hierarchical fashion. In this 
manner fault arrival behavior, system response, architectural fault-tolerance features, and 
operating system features can be analyzed separately. Each model yields an analytic expres- 
sion, which can then be put together according to formulas valid for the underlying stochastic 
process. 

In practice, closed-form hierarchical solution of dependability problems has seen limited 
use. One limitation is that in combining two models, new failure states may have to be con- 
sidered, which do not arise naturally from any particular failure state of either constituent 
submodel. We have presented a method for resolving such a situation. Using our formulas, 
it would seem feasible to incorporate the possibility of failure arising from the interaction of 
different hierarchical levels into a solution package such as SHARPE. The point of view we 
have presented emphasizes certain density functions and distributions arising in the study of 
semi-Markov processes. These quantities shed new light even on constant-rate processes, and 
are the key to solving models by decomposition. Large classes of (cyclic) semi-Markov 
chains can now be solved using the foundations laid in this article. The question of the nu- 
merical robustness of the closed-form approach is still an open one. This does not detract 
from the fact that “exponomial” methods are of great potential value in solving the problems 
of reliability modeling, which remain of both practical and theoretical interest 
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Model I (Transient Error Detection) 
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Model II (Error Arrival and Recovery) 
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